Sun NFS: Design, Implementation, Experience

Sandberg

 

FS protocol built on top of Sun RPC / XDR.  Wanted portability across other machine architectures.  Clients and servers can be heterogeneous.  Presents a VFS (virtual file system) clients.  Uses Vnodes.

 

Goals:

 

All operations are synchronous RPC.  Client blocks and waits for response. Server must committ data to stable storage before returning results (could be performance disadvantage, or reliability advantage).

 

Protocol is Stateless

+ crash recovery is easy  (don’t have to do crash detection; on server crash, clients keep attempting to resend; on client crash, server has no cleanup to do)

- can’t implement stateful operations like locking (no consistency; two processes can write a file at the same time)

- client can’t tell difference between a server that has crashed or one that is just slow

 

NFS calls -> RPC -> UDP -> IP

 

NFS calls: null, lookup, create, remove, getattr, setattr, read, write, rename, link, symlink, mkdir, rmdir, readdir, statfs.

 

Most operations take an opaque (to the client) file or directory handle.

 

MOUNT operation used to mount remote FSes... separate operation to simplify interoperation with various access control / authentication protocols (i.e., Kerberos)

 

file handle = (inode number, inode generation number, filesystem id).  inode gen num required b/c file can be deleted and inode num can be reused.

 

hard mounted vs. soft mounted file systems:  hardmounting keeps trying operations if server is down.  softmounted filesystem gives up after a while.   hardmounting may cause long hangs.  softmounting may cause crashes in client apps because many apps don’t check return codes.

 

 

 

 

Issues

 

implementation allows authentication mechanism to be plugged in.

YP (yellow pages) service used to create glat uid, gid namespace.

root on one server should not necessarily be given root on another server, so when root mounts a FS, he/she gets to use nobody permissions on files on that server

 

no locking.  requires state, and no one agrees on a locking mechanism anyway.

this means that consistency is not guaranteed in NFS... two people can write a file at the same time!  however, file modifications are locked at the inode level, so can’t intermix data to a given inode on a single write.

 

UNIX file semantics.  file can be deleted or permissions can be changed while it is opened.  the way that NFS handles this is that, if remove is called while the file is open, they do a rename to a temp file. 

 

in unix, permissions are only checked when file is open.  however, in nfs, if file is opened, and then read permissions are taken away, file can become inaccessible because protocol is stateless.  they get around this by storing client credentials in file table, and using them on later requests.

 

still, even with the above two modifications, not all unix semantics were maintained.  (ie two clients open a file, one deletes, the read for the other will fail)

 

Time Skew.  ranlib, emacs, and ls were impacted by time skew between clients and servers.  different programs may use different times that come from either the client or the server causing a time skew problem (i.e. client writes a time that comes from its clock in a file, but the last modified date is applied by the server)

 

Performance Optimizations. 

 

Performance Issues

 

 

 

Competition

 

AT&T’s RFS sucks because it only works with UNIX V.3, and wasn’t even released.  RFSs only advantage is that is does support 100% UNIX semantics, but the author would not expect client applications to be able to deal with error codes from RFS... the analagous error codes would only happen on a local disk upon disk failure or disk full, so most applications would not even attempt to recover anyway.  In the meantime, NFS works with 5 different OSes, and had been in operation for 1 year.  (RFS still had not been released.)